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Question 1 {10 marks] 


An allele is a variant form of a given gene. The relative frequencies of the two alleles change 
from generation to generation within a population. When one of the alleles disappears from 
the population the other allele is said to be “fixed”. In a small population fixation can 
occur within a few generations. 

The number of generations before an allele becomes fixed in a hypothetical population was 
simulated in R. The result, the number of generations to fixation over 100 simulations, was 


returned in a vector sim.data. A histogram of sim.data is shown in Figure 1. 


Histogram of sim.data 


count 
1 


0 10 20 30 40 50 60 


generations to fixation 


Figure 1: Histogram of sim.data 


Question 1 is continued on page 3 


STAT470 Trimester 2, 2017 


A bootstrap method was used find a 95% confidence interval for the mean of the number 


of generations to fixation. R code implementing the method is given in Table 1. 


R 
N 


9999 
length(sim. data) ## = 100 


Tstar.mean = rep(NA, R) 
for (tan 1:R) 4 
datastar = sample(sim.data, N, replace=T) 
Tstar.mean[i] = mean(datastar) 
i: 
mu = mean(sim.data) 
sigma = sd(sim.data) 


boot.CI = quantile(Tstar.mean, p=c(0.025,0.975)) 


Table 1: R code for the bootstrap CI 


(a) With reference to the R code in Table 1, explain briefly how the vector Tstar.mean 


is generated. [2 marks] 
(b) Explain how the data in the vector Tstar.mean is to be interpreted. [2 marks] 


(c) Explain the rationale behind the calculation of the bootstrap confidence interval used 
in the R code in Table 1. [2 marks] 


(d) Output from the R code in Table 1 is given below. 


mu = 13.88 
Sigma = 9.43 
boot.CI = 12.13 15.81 


(i) Use the normal approximation to calculate a 95% confidence interval for the 


mean of the number of generations to fixation. [2 marks] 


(ii) Compare the bootstrap and normal confidence intervals accounting for any dif- 
ferences or similarities between them. 


[2 marks] 
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Question 2 [10 marks] 


Let Yi, Y2,...,Y, be a random sample from a Gamma(2, 3) distribution with pdf 


fiy= ger’, y> 
(a) Find the likelihood L(3 | y1, ya,..-,Yn)- [1 mark] 


(b) Show that )7'"_, Y; is sufficient for 6. Briefly explain what this means. [3 marks] 


(c) Show that the maximum likelihood estimator for ( is 


5 1 
Se 


i=1 


[2 marks] 
(d) Show that 6 is an unbiased estimator of 8. [2 marks] 
(ec) Show that f is an consistent estimator of £. [2 marks] 


Question 3 {10 marks] 


(a) In the 19th century the Swiss astronomer Rudolf Wolf recorded the results of 100,000 


throws of a single die. His results are tabulated below: 


| 1 D) 3 4 5 6 
Observed | 16,632 17,700 15,183 14,393 17,707 18,385 


The observed value of the y? statistic for Wolf’s data is about 750. Noting that a y? 
distribution with n degrees of freedom is the same as a Gamma(4, 2) distribution, 
what can you conclude about the “fairness” of Wolf’s die? Carefully explain your 


reasoning. 


[4 marks] 


Question 3 is continued on page 5 
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(b) The following table categorises car accidents by weight of car and severity of driver’s 


injury. 
Weight of car in tonnes 
Injury <12 12-14 >1.4 | Total 
very severe | 34 ae 8 64 
average 43 41 47 131 
moderate 51 60 50 161 
Total 128 123 105 396 


You are provided the following R output: 


Pearson’s Chi-squared test 


data: .Table 


X-squared = 15.4, df = 5, p-value = 0.0040 


(i) State the hypothesis for independence of factors in the data set. [1 mark 


(ii) Under the hypothesis in (i), what is the expected number of moderate injuries 


in accidents involving a car of weight greater than 1.4 tonnes? [1 mark 
(iii) Explain how the output should be interpreted. [2 marks 
(iv) Write a conclusion. [2 marks 
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Question 4 [10 marks] 


These data are imports of petroleum in barrels per day from 1994 to 2002. 


year (zx) 1994 1995 1996 1997 1998 1999 2000 2001 2002 
amount(y) | 1728 1573 1604 1755 2136 2464 2488 2761 2269 


The regression model for these data centres the years around 1998, i.e., 
Y = 8) + Bi(a — 1998) + € 


A linear regression was performed giving the following estimates: 


# coefficients 
(Intercept) 2086 
year 137 


# variance-covariance matrix 
(Intercept) year 

(Intercept) 6464 0.0 

year 0.0 970 


(a) The model can be re-written in matrix form as Y = XG. Write 6 and the first three 


rows of the design matrix X. [2 marks] 


(b) Write down the equation of the regression line. Estimate the expected amount of 


petroleum Y904 imported in 2004. 2 marks 
(c) Show that Cov(J, 81) = 0 for this model. 2 marks 
(d) Find Var(Y2004). 2 marks 
(e) Calculate a 95% confidence interval for Yooo4. Use tz7o.95 = 2.36. 2 marks 


Question 5 {10 marks] 
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Atmospheric ozone concentrations in parts per hundred million were measured in 3 com- 


mercial lettuce gardens on 10 occasions and the exploratory data plot is shown in Figure 2. 
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Figure 2: Ozone concentrations from 3 gardens 


You are provided the following R output: 


Response: Ozone 
Df Sum Sq Mean Sq F value Pr(>F) 
Garden 2 62.1 31.0 * 2.3e-05 
Residuals 27 51.4 19 
Estimate 2.5 % 97.5 % 
A 3.0 2h. 3.9 
B 5.1 4.2 6.0 
On 2665 5.6 7.4 


Question 5 is continued on page 8 
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(a) Calculate the missing F value in the ANOVA table. [1 mark] 


(b) State the null and alternative hypotheses. Use subscripts to denote specific group 


means. [2 marks] 
(c) Explain the interpretation of the ANOVA p-value. [2 marks] 


(d) Verify the 95% confidence interval for Garden C. Show all calculations and use 
t97,0.95 = 2.05. 


[2 marks] 


(e) Write a conclusion. [3 marks] 


Formulae are given on page 9 and Distributions on page 10 


Please remember: This examination paper MUST BE HANDED IN. Failure to do so 


may result in the cancellation of all marks for this examination. Writing your name and 


number on the front will help us confirm that your paper has been returned. 
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Formulae 
Lp E-—p 
Z = Z = 
o o//n 
pa a= se(Z) = —= 
= n 
d— : 
t= se(d) = “4 
7 n 
8 MSE 
t+ t* L ~) = — ti) = 
Ha x se(Z) se(Z) Ti se(Z;) 3 
ee ° 2 
yD - 1p BI ee, 
se(%1 — X2) sen ee) ny ss ng 
Tees 3 nos ye 2 2 1 1 
(1 — %) + t* x se(X1 — F2) se(Z1 — Xo) = Sl + ae 
2 _ (m —1)s{ + (m2 — 1)83 
P ny tng —2 
TL TLo 
pase DUS?) margin of error = z* P=?) 
n n 
(1p (ise 
(pr — po) 4: BF sy se(py — po) se(py = po) = 2! 1) | pa( p2) 
Ny ng 
dar 
x7 = Ss 9 E, Ey = np; 
2 yr (Oi — Ei) Ri x C; 
xX ~~ oe Ei; Ei a n 
ij 
by 
= bj + t*se(b 
se(b) 2 se?) 
I(t+1) =< () I(n) = (n—1)! 
p(O\x) x p(x|A)p(@) Var(aX + bY) = a? Var X + b? Var Y + 2abCov(X, Y) 


Distributions 


Distribution 
bin(n, p) 
Pois(y) 
geom/(p) 
N(p, 0°) 
U(a,b) 
exp(8) 


Gamma(a, 3) 
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Probability Function Mean Variance Mx(t) 
p(r) = (*\ora =p)? np  — np(1—p) ewe 
Cae 
p(z) = me <2 £=0,1,2,.5: od LM exp[—p{1 — exp(t)}] 
DE) Hy Ba ye: 1/p  (1—p)/p? pe’ /{1 — (1— p)e*} 
_ 1 1 xr — wh 2 2 t 12¢2 
ts ov 20 ve = ( oO ) H e (u 5 oe 
b b—a) 
fle) =;~—,a<e<b ath 0 et ethfilb—a) 
f(a) = 50° B 6  (1— Bt)" for t< 1p 
sey = sot af ap? (1 — 6t)~° for t<1/8 


10 


